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Abstract 

One of the most popular lottery games worldwide is the so-called "lotto k/N v . 
It considers iV numbers 1,2, . . . , N from which k are drawn randomly, without re- 
placement. A player selects k or more numbers and the first prize is shared amongst 
those players whose selected numbers match all of the k randomly drawn. Exact 
rules may vary in different countries. 

In this paper, mean values and covariances for the random variables representing 
the numbers drawn from this kind of game are presented, with the aim of using 
them to audit statistically the consistency of a given sample of historical results with 
theoretical values coming from a hypergeometric statistical model. The method can 
be adapted to test pseudorandom number generators. 
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1 Introduction 



The concept of chance has a long history, but, as far as we know, early scientists 
and mathematicians working in the Hellenistic period did not develop either 
a theory of probability or statistical methods [1] . Gambling became more and 
more popular in Europe in the XVIIth century, due to the emergence of a class 
of people affluent enough to travel along the continent and waste money in 
such games. Games of chance were at the origin of a new wave of interest on the 
rules of chance [1] and fostered the first rigorous results in Probability Theory. 
Among all the games of chance, lotteries have been and still are very popular. 
They are used by governments to levy indirect taxes on very poor people. It is 
not clear when the first European lottery games started, but it seems that they 
could have been already present in the XVth century. Influential names in the 
history of science, such as D'Alembert, Euler, D. Bernoulli, Huygens, Leibniz, 
Laplace and many others analyzed lotteries for practical purposes, such as 
designing them and optimizing governmental collected revenues, but also with 
theoretical goals in mind, helping to accelerate the development of Statistics 
and Probability Theory. A very interesting account on the history of lotteries 
emphasizes the role of Genoa (an Italian Sea Republic of the Middle Ages) 
in introducing state-run lotteries [2]. That paper includes further interesting 
references. 



Nowadays, analysis, design and simulation of lottery games continue to be 
an active research area, mainly for statisticians and economists [3,4,5,6], and 
also studied as a suitable tool for teaching elementary probability theory and 
Statistics [7,8], but even new interesting theoretical results have been obtained 
recently [9]. 



In this work we present a statistical data analysis of randomness of Mexican 
and Italian lotteries; although, strictly speaking, it is known that there is no 
way to "prove" the randomness of a sequence of numbers [10], it is always 
possible to statistically test whether or not historical results exhibit the quan- 
titative properties derived from the probabilistic model assumed to explain 
the selection mechanism. In this respect, the statistical procedure presented 
here could be easily used as a test of pseudorandom number generators. 
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2 Theory 



2.1 Probabilities 



Readers can find the following references useful to understand the material 
presented in this section: [11,12,13] for what concerns Probability and combi- 
natorial analysis and [14] for Statistics. 

The total number of possible combinations of k objects chosen from a set of 
N objects is given by the combinatorial coefficient U N choose k": 



\k J k\(N-k)\ 

We denote by X the random variable corresponding to the number of matches 
out of the k randomly drawn numbers. Here, we use the hypergeometric model 
and we prefer the technical term "fairness" in place of "equiprobability" as, 
strictly speaking, all the lottery numbers are equivalent-exchangeable, but the 
odds of extracting them do not follow the uniform distribution (sampling is 
without replacement) and, in drawing each of the N objects, the probability 
of matching exactly % numbers, out of k selected by the player, is given by [15] 




where i — 0, . . . , k. 

In order to test the hypothesis of fairness, we consider a multivariate test on 



Y(i),...,Y (k) 



the sorted 



the mean parameter of the random variable Y' 
outcome vector. Here, denotes the random variable corresponding to the 
number in the th place (recall that the randomly selected numbers are put 
in ascending order i.e., Ym < Yi 2 ) < ■ ■ ■ < Y(k))- The probability that the %— th 
number corresponds to the value r, is calculated from Eq. (1) with a suitable 
choice of parameters. In fact, Y^ = r if and only if i — 1 numbers fall between 
1 and r — 1, and k — % numbers fall between r + 1 and N. Therefore, 




m*> = r]= , J J ( 2 ) 



The joint probability that the i— th and j— th numbers have the values r and 
s, respectively, is 




Y {1) = r,Y {3) = s = ; II II (3) 
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for i < j and r < s. 



2.2 First and second order moments 



The expected value of the %— th number in the sorted outcome vector is: 



E 



Yi 



(0 



k 5 ^ 



and the expected value of its square is: 



E 



V 2 

Y {i) 



'An 



-1 W-fc+j 



E 



r — l\ /N — r^ 
% — 1 1 \ k — % , 



V- 1\ N -r^ 
J - 1 \ k - i , 



(4) 



Its variance is then obtained as 



Var 



y (0 



E 



V 2 



-{e [y (0 ]} s 



(5) 



(6) 



Finally, the covariance between the values appearing in i— th and j— th places, 
can be calculated for i < j as 



Cov 



Y (i)' Y U) 



= E 



Y(i)Y {j) 



E 



Yu 



(0 



E 



where 



E 



y « y 0') 



k 



s=r+l 



k-j 



(7) 



(8) 



Using the above results, we find that under fairness the i— th component of 
the vector fj, — E [Y] is just 



fJ>i = E 



Y (i 



(<) 



(N + l)i 



l,...,k. 



(k + 1) ' 

On the other hand, the covariance matrix V = Var [Y] has elements 

i(k- j + 1)(N + 1)(N - k) 



(9) 



Cov 



y (i)' y (j) 



(A; + l) 2 (k + 2) 



(10) 



for 1 < % < j < k. 



Remark. Often, the rules of the game allow for the selection of an additional 
number, called bonus number. In such a case, the formulae above must be 
slightly modified. For instance, Eq. (1) assumes the following expression: 



P' [X 



'k\ fN-k-l\ /iV N 



k-i-l 



k 
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However, in this paper, we do not consider this situation, and in any case, 
bonus numbers do not affect the distribution of the order statistics. 



2.3 Examples: lotto 6/51 and 5/90 



As an illustration, we present the explicit mean and variance/covariance ma- 
trix in two settings: the case N = 51 and k = 6, as an example of the Mexican 
game, and the case N = 90 and k = 5 from the Italian game. Notice that 
we give the inverse variance/covariance matrices as they are involved in the 
chi-squared test statistics. 

For the 6/51 game, the mean is 

and the inverse variance/covariance matrix is the tri-diagonal matrix 





v- 1 
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7 7 7 7 7 7 
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When k = 5 and iV = 90, the mean vector is 



91 182 273 364 455 
6 6 6 6 6 



and 



v- l = 



12 
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3 Hypothesis testing 



Let us denote by yi, . . . , y m the observed outcome vectors from m games, and 
by y the corresponding average. 

To test the null hypothesis E [Y] = fi, we use both an asymptotic approach 
and a Monte Carlo one. 

With the asymptotic approach, we make use of the multivariate central limit 
theorem, see [14], Chapter 11. Therefore, under the null hypothesis the quan- 
tity 

Q = m (y- fl ) , V- 1 (y- f i) (12) 
converges in distribution to a chi-square distribution with k degrees of freedom, 
denoted by xfk)- Should the data exhibit departures from the known mean 
vector and/or variance/covariance matrix, the quantity Q will show departures 
from the xfk) distribution. Thus, a test for the parameters can be performed 
by computing Q, from a sample of m previous results, and calculating the 
associated p— value based on the xfk) distribution. 

With the Monte Carlo approach, we approximate the distribution of Q under 
the null hypothesis through the random generation of 5, 000 values of Q, each 
based on the same sample size as the observed draws. 



4 Numerical results 

4-1 The Mexican "melate" lotto game 

In Mexico, a very popular game is the game known in this country as melate. 
The historical results are available at www.pronosticos.gob.mx, the official 
web-site of "Pronosticos Deportivos para la Asistencia Publica" . 

The melate game was available to the Mexican public for the first time on 
August 19th, 1984, with the scheme of selecting k — 6 numbers out of iV = 39 
until April 4th, 1993, when N was set to 44. On October 6th, 2002, the game 
was again modified and N increased to 47. Another modification to this game 
was made on December 4th, 2005, raising N to 51, until December 9, 2007 
corresponding to draw number 2088. As of December 12, 2007, iV was raised 
to 56. For N = 51 the sample includes 211 results, from December 4, 2005 
(draw number 1878) up to December 9, 2007 (draw number 2088). We denote 
the 4 periods with PI, P2, P3, and P4. 

Table 1 shows the sample average vectors for each type of game, computed 
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from the historical results. Using the parameter values found for each case, 



Period 


N 


f(i) 


1/(2) 


1/(3) 


1/(4) 


1/(5) 


1/(6) 


Draws 


PI 


39 


5.634 


11.679 


17.195 


22.859 


28.699 


34.153 


555 


P2 


44 


6.284 


12.746 


19.265 


25.714 


32.288 


38.730 


992 


P3 


47 


6.964 


13.579 


20.591 


27.691 


34.379 


41.161 


330 


PA 


51 


7.739 


14.104 


22.038 


30.227 


37.564 


45.635 


211 



Table 1 

Average results from the Mexican "melate" lotto game. August 19, 1984 to Decem- 
ber 30, 2007. 



the Q— statistic defined in Eq. (12) was calculated and the results are sum- 
marized in Table 2, together with the asymptotic and Monte Carlo approx- 
imated p— values. As it can be seen from Table 2, the historical results for 



Period 


N 


Q 


CLT p-value 


MC p— value 


PI 


39 


6.09 


0.4127 


0.3962 


P2 


44 


2.50 


0.8680 


0.8746 


P3 


47 


1.76 


0.9403 


0.9392 


PA 


51 


18.25 


0.0056 


0.0066 



Table 2 

Calculated Q— statistic and associated p— values for each group of results from the 
melate lotto game. CLT p— value is the Central Limit Theorem-based p— value and 
MC p— value is the Monte Carlo approximated p— value. 



N = 39,44,47 produce small values of Q, with associated p— values which 
show statistical consistency of the sample averages with their corresponding 
theoretical values. 

However, from the 211 available results for N = 51, we found Q = 18.25 
with an associated probability value of p — 0.0056, which constitutes strong 
statistical evidence to conclude that the mechanism that generated the sample 
is not consistent with the theoretical means and covariances. 

Notice that the Monte Carlo p— values are close to the asymptotic ones, show- 
ing that the Central Limit Theorem is already a good approximation. This 
feature is due to the use of reasonably large sample sizes in all tests, despite 
the fact that the order statistics are known to be non-normal. 
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4-2 The Italian lotto game 



In Italy, the lotto game is a 5/90 game and has been available on several 
wheels at least from 1863. As mentioned in the introduction, the game has a 
long history, and similar games have been played in Italy at least since 1630. 
We consider in this paper only one wheel, the Rome wheel, and the same 
periods of time as for the Mexican lotto. The choice of 4 periods is motivated 
by the need of reproducing similar sample sizes with respect to the previous 
analysis on the Mexican data. The historical results from January 7th, 1939 
are available at www.lottomatica.it, the official web-site of the game. The 
results are summarized in Tables 3 and 4. The data are analyzed with the 
same procedure as discussed in the Mexican case. From table 4, we see that 



Period 


2/(1) 


2/(2) 


2/(3) 


2/(4) 


2/(5) 


Draws 


PI 


17.251 


33.827 


49.316 


64.191 


77.713 


450 


P2 


14.858 


30.270 


45.622 


60.643 


76.192 


788 


P3 


15.401 


29.930 


45.059 


60.763 


76.072 


359 


PA 


15.054 


31.517 


45.698 


59.670 


75.095 


315 



Table 3 

Average results from the Italian lotto game. August 19, 1984 to December 30, 2007. 



Period 


Q 


CLT p-value 


MC p-value 


PI 


31.17 


< 10~ 5 





P2 


2.07 


0.8387 


0.8438 


P3 


1.62 


0.8991 


0.8962 


PA 


8.05 


0.1535 


0.1576 



Table 4 

Calculated Q— statistic and associated p— values for each group of results from the 
Italian lotto game. CLT p— value is the Central Limit Theorem-based p— value and 
MC p— value is the Monte Carlo approximated p— value. 



the data in the period August 19th, 1984 until April 4th, 1993 produce a 
Q— statistic of 31.17, with a p— value near zero. This means that in the decade 
1984 — 1993 the data do not agree with the hypothesis of fairness in the draw 
of the 90 balls. 
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5 Conclusions 



In this paper we have presented an empirical test of randomness applied to 
historical data samples taken from Mexican and Italian institutional lotter- 
ies. The theoretical mean vector and covariance matrix for the random vector 
representing the outcome in lotto k/N games for these two sets of data were 
obtained. Also, and in order to test consistency in our statistical procedure, 
Monte Carlo data were generated by simulating a lottery game and compared 
to data. Application of this procedure to computer-generated random num- 
bers is suitable as a test of randomness for the corresponding pseudorandom 
algorithms. 

For certain periods, statistical evidence was found that the observed average 
vectors of outcomes significantly differ from their theoretical values. The odds 
associated to the observed difference for the Mexican historical data are less 
than 1 in 178; roughly speaking, if during the next 356 years, we could apply 
this test to results corresponding to non-overlapping two-year periods, only 
in one case would we expect to obtain a difference as large as the one found 
here. An even worse situation was detected in one period of the Italian 5/90 
lottery for the Rome wheel. 

The above results are important from the practical point of view, considering 
that Lotto games are relevant sources of income both for local and national 
governments in many countries around the world. The regular use of auditing 
procedures is recommended; monitoring the historical results with the aid of 
multivariate statistical procedures, will help in improving the quality of the 
service by detecting possible deviations from the desired ideal behaviour and 
in strengthening the confidence of the general public in institutional lottery 
agencies. The cases where the observed results are highly unlikely under fair- 
ness assumptions, as those illustrated here, should be further investigated in 
order to detect the sources of bias. 
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